A New View on Geometry Optimization: the Quasi- 
Independent Curvilinear Coordinate Approximation* 
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This article presents a new and efficient alternative to well established algorithms for molecu- 
lar geometry optimization. The new approach exploits the approximate decoupling of molecular 
energetics in a curvilinear internal coordinate system, allowing separation of the 3iV-dimensional 
optimization problem into an O(N) set of quasi-independent one-dimensional problems. Each un- 
coupled optimization is developed by a weighted least squares fit of energy gradients in the internal 
coordinate system followed by extrapolation. In construction of the weights, only an implicit depen- 
dence on topologically connected internal coordinates is present. This new approach is competitive 
with the best internal coordinate geometry optimization algorithms in the literature and works well 
for large biological problems with complicated hydrogen bond networks and ligand binding motifs. 
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INTRODUCTION 

Recently, iV-scaling electronic structure algorithms (N 
is the system size) are beginning to realize their promise, 
delivering energies and forces on nuclei for large systems 
at both a numerical accuracy and theoretical level suit- 
able to begin addressing many complex problems. For 
example, it is now possible to routinely apply hybrid 
Hartree-Fock/Density Functional theory to fragments of 
biological molecules involving several hundred atoms us- 
ing basis sets of triple-zeta plus polarization quality. De- 
spite favorable scaling properties, however, 0{N) meth- 
ods typically carry a larger cost prefactor than their con- 
ventional counterparts. Thus, in addition to parallelism, 
developing efficient algorithms to drive the core engines 
of linear scaling electronic structure theory is paramount. 
Of these, perhaps the most useful driver of electronic 
structure theory is the molecular geometry optimizer, 
which follows the energy gradient downhill to a local min- 
imum. 

We present a new concept for the optimization of 
molecular geometries based on the approximate separa- 
bility of molecular motions in curvilinear internal coordi- 
nate systems, and the recent availability of fast transfor- 
mations to and from these coordinate systems for large 
molecules 0, S ■ Internal coordinates represent the 
internal motions of a molecule, involving bond stretches, 
angle bendings and dihedral torsions. These motions cor- 
respond to different energy and length scales, leading to 
strong diagonal dominance in the matrix of second order 
energy derivatives (the Hessian) as well as in higher order 



anharmonic energy tensors. Off diagonal elements of the 
internal coordinate Hessian are typically an order of mag- 
nitude smaller than the diagonal ones 0, El 0, E 13 This 
is the basis for an approximate decoupling of molecular 
motions in terms of internal coordinates. 
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FIG. 1: Progression of gradients on a valence angle bending 
coordinate of ammonia. Energies and forces have been ob- 
tained by the PBE density functional model in STO-3G basis 
set. The optimization was started from near planar geometry, 
i.e. from the vicinity of a transition state, and converged to a 
local minimum of the potential energy surface. The numbers 
in the picture indicate the sequence of optimization steps. 
The dashed line represents a linear fit, the star the predicted 
location of the minimum. 



An observation central to the present contribution is 
that internal coordinate gradients show trends during 
optimization that can be captured by curve fitting and 
used to predict minima by one-dimensional extrapola- 
tion. This is illustrated in Fig. ^ showing a typical pro- 
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gression of internal coordinate gradients. 

In the following, we review conventional methods of 
geometry optimization (Section II), and then develop 
our new idea conceptually (Section III). A practical im- 
plementation of this concept is then outlined (Section 
IV) in the context of linear scaling quantum chemistry. 
Next, this implementation is tested against Baker's small 
molecule test suite and applied to a reaction model of 
Protein Kinase A (Section V). 

Conventional Methods 

Over the last three decades, internal coordinate ge- 
ometry optimizers have become the standard tools for 
finding local minima when energies and forces are calcu- 
lated by electronic structure theory. Internal coordinates 
describe the internal motions of molecules with chemical 
degrees of freedom such as bond-stretches, valence an- 
gle bendings and the torsions of dihedral angles. These 
curvilinear coordinates are constructed from non-linear 
functions of the Cartesian coordinates of atomic nuclei 

The reduced vibrational coupling achieved with in- 
ternal coordinates provides an advantage in molecular 
geometry optimization Q and molecular dynamics 0. 
Specifically, a full Newton step in Cartesian coordinates 
will typically generate a larger, anharmonic coupling rel- 
ative to a full Newton step carried out in internal coordi- 
nates. An excellent discussion of this issue can be found 
in the Introduction of Ref. 0- 

Today, the most aggressive internal coordinate algo- 
rithms have been highly optimized for small molecule 
quantum chemistry. These methods typically generate a 
Cartesian Hessian, either exactly or approximately. This 
Cartesian Hessian is then transformed into an internal co- 
ordinate representation, and an internal coordinate New- 
ton step is taken. The internal coordinate step is then 
transformed back to a Cartesian displacement and used 
to increment the molecular geometry. 

There are many variations on this approach. Cartesian 
Hessians are usually calculated by updating algorithms 
in the framework of the variable metric or quasi-Newton 
approach such as the Broyden-Fletcher-Goldfarb-Shanno 
(BFGS) method Ej|. With increasing system size, stor- 
ing, transforming and potentially inverting a full Hes- 
sian matrix scales as 0(N 3 ), and it becomes favorable 
to consider diagonal or semi-diagonal approximations to 
the internal coordinate Hessian. This severely truncated 
Newton or "force relaxation" method typically obtains 
static diagonal values a priori from spectroscopic data or 
empirical force-fields 0,B II S H 111 El 

It is worth noting that the landmark RHF/4-21G op- 
timization of Crambin by Van Alsenoy was carried 
out in just 79 steps using the force relaxation approach of 
Sellers ^2- I n these algorithms, strong damping ^2 or 



line searching |15| is required to avoid divergence. An- 
other important technique to accelerate convergence of 
geometry optimization is the geometric DIIS (GDIIS) 
jlfij . While geometric DIIS is a simple technique, its 
successful application may require many intricate modi- 
fications [ljj • A recent review and comparison of these 
algorithms and their technical details can be found in 
Ref. H 

A NEW CONCEPT 

In the Newton-Raphson (NR) approach to geometry 
optimization, a correction <5x to the Cartesian vector x of 
nuclear positions is found by solving a local second order 
expansion of the energy function, yielding 5x = — g-h -1 , 
where h is the Hessian, a matrix of energy second deriva- 
tives with respect to nuclear displacement, and g is the 
gradient vector of energy first derivatives. Thus, a step 
along a specific element of x is coupled to all components 
of g through off-diagonal elements of h. Alternatively, 
it is possible to work in the principal axis system of h, 
or normal modes, which derive from eigenvectors of the 
Hessian. On an ideal harmonic surface the NR method 
remains exactly decoupled to all orders in the basis of 
normal modes, and also achieves quadratic convergence. 

Now consider an approximate NR algorithm in a ba- 
sis of diagonalizing normal modes, applied to a non-ideal 
anharmonic surface. For the moment, assume that the 
normal modes remain diagonalizing with each geometry 
step, but the eigenvalues change. Then, if the geome- 
try steps are small, eigenvalues can be re-computed with 
backward differences of the gradient information at each 
step. With infinitesimal steps, this independent coordi- 
nate approximation is formally equivalent to NR. 

Realistically though, both eigenvectors and eigenval- 
ues of the Hessian will change as the algorithm takes the 
largest possible steps, leading to second (and higher) or- 
der coupling effects. At this point, we turn to an internal 
coordinate system to continue development of the inde- 
pendent coordinate approximation. In an internal coordi- 
nate system, the magnitude of harmonic and anharmonic 
coupling effects are small over the course of optimization. 
On the other hand, in a static basis of normal modes an- 
harmonic coupling effects are larger to begin with , and 
coupling will increase during the course of optimization 
unless the normal modes are periodically recomputed, 
incurring an unacceptable cost for large systems. 

Extending the independent coordinate approximation, 
it is clear that a differencing scheme will be inaccurate 
due to large step sizes. Also, because coordinate indepen- 
dence is an approximation, coupling effects will lead to 
scatter in the gradient data. Replacing the differencing 
scheme with a curve fit has the potential to both smooth 
the data and provide higher order accuracy. Using the 
local tangent of a fitted curve to approximate diagonal 
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elements of the Hessian would be similar to force relax- 
ation H 0, , but with constantly updated values of 
the Hessian. However, the curve fit can be carried fur- 
ther; forsaking an approximate Newton step, the curve fit 
can be used instead to extrapolate or interpolate a zero 
for the parameterized gradient data. To first order (a line 
fit), this is equivalent to a one-dimensional Newton step 
with a fitted gradient and a fitted Hessian. While the 
approach outlined so far works well, the use of standard 
least squares is not optimal due to the ability of outliers 
to skew the fit. A significant advance can be realized by 
instead using a weighted curve fit, with the weights de- 
pending indirectly on estimation of local coupling effects. 

We call this concept the "QUasi Independent Curvi- 
linear Coordinate Approximation" or QUICCA. 

Explicitly then, application of QUICCA to geometry 
optimization involves the projection of Cartesian gradi- 
ent information onto an uncoupled iV^t dimensional in- 
ternal coordinate space. Energy gradients are accumu- 
lated for each internal coordinate, as shown in Fig.^ and 
used to formulate N- nlt independent curves by weighted 
least squares. At each step, these curves are extrapo- 
lated (or interpolated) to zero, with collective changes 
in the internal coordinate system taken as an approx- 
imate minima for the system as a whole. The final 
back-transformation of the step to Cartesian coordinates 
merges these displacements, reconciling any redundan- 
cies in the internal coordinate system. The quality of this 
collective extrapolation is determined by severity of the 
low order coupling, redundancy of the internal coordinate 
system and the accuracy of the fit. The QUICCA ap- 
proach to geometry optimization is simple to implement 
and scales linearly with system size. It overcomes low or- 
der coupling effects through averaging and weighting of 
the curve fit and also admits the accumulation of anhar- 
monic information. However, augmenting QUICCA with 
anharmonic quadratic and higher order terms and pre- 
serving the important smoothing effect of the weighted 
fit is an advanced subject beyond the scope of this work. 

IMPLEMENTATION 
Recognition of Internal Coordinates 

The present implementation of QUICCA employs a re- 
dundant set of primitive internal coordinates. Similar co- 
ordinate systems are used by a majority of other internal 
coordinate optimizers [a,LDjllSll20,l21|- The generation of 
these coordinates starts with the recognition of covalent 
bonds, based on atomic Slater radii [22| multiplied by an 
empirical factor of 1.3. Weak bonds are recognized on the 
basis of atomic van der Waals (VDW) radii [23j . Note, 
that no VDW contact is allowed between atoms, which 
are second or third neighbors in the covalent bonding 
scheme. The search for neighboring atoms is carried out 



by dividing the system into small cubes a few angstroms 
on a side, and bonds are looked for only within each 
box and between nearest neighbor boxes. This divide- 
and-conquer scheme ensures the fast and linear scaling 
recognition of bonding for large molecules. The bonding 
scheme is then stored as the symbolic component (graph) 
of the sparse topology matrix. 

At each step of the optimization, the bonding scheme 
is determined and the corresponding primitive topology 
matrix is stored. Then, primitive topology matrices from 
the net most recent steps are summed to yield a merged 
topology matrix, where ngt is the same number of points 
employed by the curve fitting. In the current implemen- 
tation rtfit = 7, but it can be much larger or smaller. 

The merged topology matrix avoids fluctuations due 
to the possibility of bond-formation and bond vanishing 
during the optimization of large molecules. Use of the 
merged topology matrix increases the number of inter- 
nal coordinates and the redundancy of the coordinate 
set only slightly; the final merged topology matrix is de- 
termined directly from primary topology matrices rather 
than the merged ones at previous cycles. 

Based on the topology matrix of the merged bonding 
network, all other primitive internal coordinates, such as 
bond angles, torsions, linear bendings and out-of-plancs 
are recognized. Torsions are selected so that each bond 
will be associated with only a single torsional coordinate. 



Preparing the Data 

Cartesian coordinates and the corresponding Carte- 
sian gradients of the nfi t = 7 most recent geometries are 
read from disk. Based on the most recent definition of 
the bonding scheme they are transformed into the inter- 
nal coordinate representation, determined by the merged 
bonding scheme. The coordinate transformation is car- 
ried out as described in Ref. Q, while the necessary inter- 
mediate sparse matrices of the coordinate transformation 
are re-generated. This is a very inexpensive step, due to 
the extreme sparsity of the vibrational B matrix (lfij . 
Thus the primary objects of the proposed geometry op- 
timization, the internal coordinates and their gradients, 
are re-created at each step allowing on-the-fly redefinition 
of the coordinate system. 

In order to avoid problems due to the periodicity of 
torsional angles, reference points are used, eg. the tor- 
sional angle values of the most recent geometry. Each 
torsional angle is set to its periodic equivalent closest to 
the reference point, by a shift of ±2ir, while its gradient 
value does not change. 
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The Weighted Curve Fit 

At this point, we have a set of ng t coordinate - gradient 
pairs for each individual internal coordinate, such as the 
one presented in Fig. ^ These data pairs parametrize 
progression of the gradient during the ng t most recent 
steps of the optimization. This data is now used to 
fit curves for each of the N[ nt coordinates using the 
SLATEC routine POLFIT and its dependencies 0, 
which perform a weighted least squares fit. 

The choice of weight is a key factor in the behavior of 
the QUICCA algorithm, as it is used to prioritize reli- 
able portions of the data and to generally smooth noisy 
trends. Originally, we considered the standard expres- 
sion wl = (l/g { i ] ) for the weight of the fc-th internal 
coordinate at the i-th geometry, which is generally rec- 
ommended for use in curve fitting |25l |. These weights 
however do not account for the fact that the gradient 
of the fc-th internal coordinate may be small, while its 
immediate environment may contain large gradients indi- 
cating the possibility of large changes in the fc-th internal 
coordinate due to coupling effects. Thus, a data point 
with high environmental tension is less reliable than a 
point with neighboring coordinates that are relaxed. Lo- 
cal sums of the gradient squares, 
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incorporate this effect, where / indexes internal coordi- 
nates that have atoms in common with the /c-th internal 
coordinate, and H u is an approximate, static approxi- 
mation to the diagonal Hessian. The terms Hu yield 
dimensionally consistent sums, since each of the gradient 
components gi has a different measure and energy scale 
for stretches, bendings and torsions. 

For each coordinate k, the sum over I is determined 
by the topology matrix, given by the symbolic structure 
(graph) of the sparse matrix Gi = BB l , which obtains 
from the sparse Wilson's B matrix [Jjj- We have consid- 
ered more delocalized interactions as well, such as those 
deriving from the graph of G\ , but they did not provide 
any significant improvement in test calculations. 

Values of 1.0 au for stretches, 0.1 au for bendings (and 
linear bendings and out-of-planes) and 0.01 au for tor- 
sions are used for the Hu. These values are similar to 
those used to provide an initial guess for the Hessian in 
variable metric algorithms Note, that only the rel- 
ative values of Hu are important for the fitting process, 
as the fitting parameters are determined by the relative 
values of the weights. 

While it is possible to employ second and third order 
polynomials in the fit, in our current implementation ex- 
trapolation to zero is most robust with just a line. A 
primary difficulty in the construction of higher order fits 



is that distinguishing between noise and signal is more 
difficult. Also, there are stability issues of extrapolation 
versus interpolation with higher order polynomials. Thus 
while we believe there is opportunity for higher order ap- 
proximations, their development is beyond the scope of 
the present paper. 



Minimum or Transition State? 

Transition states are indicated by a gradient curve with 
negative tangent at the position of the root. For example, 
in Fig. ^the tangent of the gradient curve is negative in 
the vicinity of the planar structure (structure #1), which 
is a transition state. The tangent of the gradient curve 
thus offers a convenient tool to control convergence to 
either a minimum or transition state. Whenever the local 
tangent of the gradient curve is negative, the optimizer 
employs a simple force relaxation step, 
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where i denotes the serial number of the optimization 

(i) 

step, (fk is the k-ih internal coordinate, H^ is a diagonal 
element of the Hessian determined from local tangent of 
the curve fit, and gi is the local gradient at i (not a 
fitted gradient). This avoids transition states that are 
well localized to just a few internal coordinates. The 
ability of the present implementation to treat delocalized 
transition states remains to be seen. 

Combined internal coordinates, such as the natural Q 
or the delocalized internals 26], hold the possibility of 
identifying delocalized transition states by gradient fit- 
ting. An analysis by Wales et.al. H^] provides insight 
into the numerical problems involved in controlling con- 
vergence to a minimum without an accurate Hessian. 
Note that, even with a good quality Hessian update, such 
as BFGS there is no guarantee to converge into a 
minimum instead of a transition state. This is because 
positive dcfinitcncss of an approximate Hessian does not 
guarantee the positive definiteness of the exact Hessian. 
Thus, the exact Hessian remains the only absolute for 
determining the character of extremal points in the po- 
tential energy surface [3. 



Initial steps 

In the very first step of the optimization, a force relax- 
ation step is used with the usual diagonal Hessian esti- 
mates of 0.5 au for stretchings, 0.2 au for bendings and 
0.1 au for torsions. This initial step is equivalent with 
the one traditional optimizers use. 
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Step size control and backtracking 

Experience with QUICCA has shown that extrapola- 
tions larger than the range spanned by the fitted data 
points are unreliable. Thus, while QUICCA always tries 
to achieve the predicted minimum, stepsize control is im- 
posed, limiting the step by the fit range or to 0.3 au, 
whichever is less. This maximum allowed value of the 
stepsize, 0.3 au, is similar to that used by other optimiz- 
ers Hi 

The range spanned by the fitted data is in some sense 
an uncertainty. Refining and formalizing quantification 
of this uncertainty is another area of opportunity for im- 
provement of the QUICCA optimizer, which is beyond 
the scope of this article. We note however, that it might 
be more effective to use a filtered range of the data points, 
based on their weights. For example, compute the range 
including only those points with weights that comprise, 
say, 99.99 percent of the total. Alternatively, statisti- 
cal tests could be employed to filter out outlying points. 
We note that issues of uncertainty in stepsize control are 
certain to be far more complex for higher order fits, and 
advanced statistical methods may have a significant role 
to play in this area of development. 

Backtracking 

While the above stepsize control is designed to avoid 
potential instabilities, it is still possible to take steps that 
will increase the total energy. In this case, there are a 
number of options. We can live with the increase, hoping 
it will nevertheless improve the information content used 
to build the energy surface. We can backtrack, uniformly 
reducing the step, typically by half, and re-evaluate the 
energy to see if it decreases. Alternatively, we can em- 
ploy a more sophisticated and expensive line search to 
determine a minimizing step. 

There are many ways to carry out a line-search. Per- 
haps the most widely employed approach is the endpoint 
algorithm of Schlegel |l5l |. This algorithm employs en- 
ergies and gradients at the endpoints to yield a quartic 
interpolant. This approach will work well if the inter- 
polant varies smoothly between endpoints. We have im- 
plemented this approach to the line-search, but found 
that it does not perform reliably, especially when large 
steps are made. It may be that midpoint algorithms are 
more appropriate, but we have not explored this further. 

Our experience with QUICCA is that simple back- 
tracking is both cheaper and more effective than the 
endpoint line-search. During back-tracking, a bad step 
is recognized by an elevated energy before any gradient 
evaluation takes place. Then, the step-size is halved, and 
a new energy is calculated. This process is repeated un- 
til the energy becomes lower, typically requiring just one 
halving. 



Iterative back-transformation 

Once predictions of internal coordinate displacements 
are made, they are passed to the iterative back- 
transformation to generate a new set of Cartesian coor- 
dinates, as in Ref. |a However, there is no guarantee that 
there exists a set of Cartesian coordinates that can ex- 
actly realize the new, predicted internal coordinates. For 
example, a prediction of angles in a triangular molecule 
can sum to a value greater than or less than 180 degrees. 
Such conflicts are a consequence of the redundant inter- 
nal coordinate system. 

Displacements in the updated internal coordinate sys- 
tem that result in geometric conflicts will be projected 
out during the iterative back-transformation. How- 
ever, if these conflicts are too large, the iterative back- 
transformation may fail to converge, or it may produce 
a set of Cartesian coordinates that violates the predicted 
internal coordinate values. Typically, these violations are 
big when stepsizes are large, but become small close to 
convergence. 

One approach to avoiding this problem is to work 
with linear-combinations of primitive internal coordi- 
nates, such as the natural Q or delocalized [2f| in- 
ternal coordinates, which minimize geometric conflicts. 
Another cure for redundancies is to apply a projection 
technique prior to the iterative-back transformation ||, 
which can eliminate problems to first-order, but does not 
guarantee convergence for large steps. 

In our present implementation, geometric conflicts are 
dealt with solely by the iterative back-transformation. 
In the case of non-convergent back-transformation, our 
algorithm will attempt the back-transformation again, 
but with the original internal coordinate displacements 
halved. If the repeated transformation also fails, it usu- 
ally indicates an incomplete set of internal coordinates 
and the program stops. An incomplete set can also 
scuttle convergence of the Cartesian to internal gradient 
transformation 2]. Note, that an appropriate internal co- 
ordinate recognition program together with suitable in- 
ternal coordinate step-size control can always avoid the 
non-convergent back-transformation. 



Realization 

The QUICCA geometry optimization algorithm has 
been implemented in the object oriented FORTRAN- 
90/95 programming language and is part of the Mon- 
doSCF suite of linear scaling quantum chemistry codes 
[28| . It has been successfully tested on different plat- 
forms, compilers and operating systems. 
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Scaling 

At each geometry step, the cost of QUICCA is asso- 
ciated only with recognition of internal coordinates, the 
coordinate transformations and the fitting process. The 
recognition of internal coordinates scales linearly, as de- 
scribed in Section IV A. The effort spent on fitting is 
0(N{ nt ) with jVi n t oc iVatoms, &s the number of points 
employed by each fit is a constant. Likewise, the coor- 
dinate transformations scale linearly with system size as 
pointed out in Ref. 0- 



RESULTS AND DISCUSSION 

Baker's test set 

The performance of the QUICCA optimizer has been 
tested on Baker's suite [l9( of small test molecules, and is 
listed in Table [IJ along with values from more traditional 
algorithms. At convergence, the magnitude of the max- 
imum Cartesian force vector is required to be less than 
3 x 10 -4 au for each atom. This criterion is tighter than 
that employed in other studies Ref. flil which require 
only that the maximum Cartesian force component (X, 
Y or Z) be less than 3 x 1CU 4 au. In addition to this cri- 
terion, an absolute energy change less than 1 x 1CU 6 au 
or a predicted internal coordinate displacement less than 
3 x 10 -4 au at the last geometry is required. The energies 
of the optimized structures have been reproduced to all 
digits given in Ref. 1 191 

The current implementation of QUICCA, achieving a 
total of 187 geometry steps for Baker's test suite, is com- 
petitive with Bakken's algorithm, requiring 185. 

Bakken's method (lij derives from an optimal combi- 
nation of advanced techniques related to internal coor- 
dinate methods based on a dense Cartesian Hessian up- 
date. It involves a Cartesian BFGS update the Ra- 
tional Function Optimization scheme [2jJ and a Newton 
trust-region model 11]. An important addition to these 
techniques is the extra redundant coordinates, which con- 
tribute to the success of Bakken's algorithm relative to 
more traditional schemes. Extra redundant internal co- 
ordinates are local distance coordinates, which bridge the 
terminal atoms of all valence angles and most of the di- 
hedral angles. 

In contrast, the current implementation of QUICCA 
employs a redundant primitive internal coordinate sys- 
tem, a weighted line fit and simple backtracking. 

In two cases (ACANIL01 and Pterin), the QUICCA 
optimizer proved to be considerably slower than the best 
traditional techniques. We believe these cases may be im- 
proved by introducing a more sensitive weighting scheme, 
and/or employing a more sophisticated internal coordi- 
nate system as discussed above in Section IV H. 



TABLE I: Number of geometry optimization steps to con- 
vergence of Baker's test set (up to 30 atom molecules) for the 
QUICCA optimizer and also for other advanced algorithms 
from the literature. Calculations have been carried out at the 
RHF/STO-3G level of theory. All of the reported calcula- 
tions used local primitive internal coordinates, except those 
of Bakken et.al. who used extra redundant coordinates fl^l . 



Molecule QUICCA Bakken Eckert Lindh Baker 
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Water 


4 


4 


4 


4 


6 


Ammonia 


4 


5 


6 


5 


6 


Ethane 


4 


3 


4 


4 


5 


Acetylene 


4 


4 


6 


5 


6 


Allene 


3 


4 


4 


5 


5 


Hydroxysulphane 


8 


7 


7 


8 


8 


Benzene 


2 


3 


3 


3 


4 


Methylamine 


4 


4 


5 


5 


6 


Ethanol 


5 


4 


5 


5 


6 


Acetone 


4 


4 


5 


5 


6 


Disilyl ether 


9 


8 


9 


11 


8 


1,3,5-trisilacycl. 


5 


9 


6 


8 


8 


Benz aldehyde 


5 


4 


5 


5 


6 


1,3-difluorobenz. 


4 


4 


5 


5 


5 


1,3,5-trifluorob. 


3 


4 


4 


4 


5 


Neopentane 


5 


4 


4 


5 


5 


Fur an 


6 


5 


6 


7 


8 


Naphtalene 


6 


5 


6 


6 


5 


1 , 5-difluoronapht . 


6 


5 


6 


6 


6 


2-hydroxibicyclop. 


7 


9 


9 


10 


15 


ACHTAR10 


10 


8 


9 


8 


12 


ACANIL01 


11 


7 


8 


8 


8 


Benzidine 


8 


9 


7 


10 


9 


Pterin 


12 


8 


9 


9 


10 


Difuropirazine 


6 


6 


7 


7 


9 


Mesityl oxide 


5 


5 


6 


6 


7 


Histidine 


11 


16 


14 


20 


19 


Dimethylpenthane 


6 


9 


10 


10 


12 


Caffeine 


7 


6 


7 


7 


12 


Menthone 


13 


12 


10 


14 


13 


Total 


187 


185 


196 


215 


240 



Note that these results have been achieved without 
the use of traditional convergence acceleration algorithms 
such as geometric DIIS (GDIIS) dES- It is of course 
entirely possible to deploy the GDIIS algorithm together 
with QUICCA, but we leave this avenue open for future 
work. 

Our experience shows, that QUICCA is relatively in- 
sensitive to the values of Hu employed in the construc- 
tion of weights (see Section IV C). For example the values 
0.5, 0.2 and 0.1 au for Ha (instead of our standard, 1.0, 
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0.1 and 0.01 au) resulted in 189 steps instead of 187 for 
Baker's test set. The largest difference was present in 
Dimethylpenthane, which converges 2 steps faster when 
using the standard Hu values. 

In Baker's test suite, about three force evaluations have 
been saved by using backtracking. Without the applica- 
tion of backtracking, Baker's test set converges in 190 
steps. This suggests that the core QUICCA algorithm is 
robust and does not often overstep. 

We can understand the stability achieved by the core 
QUICCA algorithm by considering a hybrid force relax- 
ation method, based on Eq. [21 m which the diagonal Hes- 

(i) 

sian elements are obtained from the curve fit, but 
the uncoupled Newton step is carried out with the cur- 
rent gradient. We have tested this alternative, but it 
turned out to be oscillatory in many cases, requiring ex- 
cessive backtracking. This should be contrasted with the 
full QUICCA algorithm, which uses extrapolation. To 
first order this is equivalent with an approximate New- 
ton step taken with the fitted gradient. Thus, the full 
QUICCA algorithm is successful not only because it gen- 
erates up-to-date curvature information, but because it 
achieves stability through smoothing effects derived from 
the weighted fit. 




FIG. 2: The structure of a 263 atom Protein Kinase A frag- 
ment used to test the QUICCA optimizer. 



Convergence 

Away from the basin of attraction, both steepest de- 
scents as well as the Newton- Raphson algorithm converge 
linearly. However, these methods behave very differently 
when condition number of the Hessian is large, for ex- 
ample when low frequency "floppy" modes are present. 



1 




.6 I ■ ■ ■ ■ ■ ■ 1 

20 40 60 80 100 120 140 

i 

FIG. 3: Convergence of the energy with geometry step i on a 
logio-linear scale for the 263 atom Protein Kinase A fragment 
at the RHF/STO-2G level of theory. 
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FIG. 4: Convergence of the root mean square of the Cartesian 
gradients, gi, with geometry step i on a logio-linear scale for 
the 263 atom Protein Kinase A fragment at the RHF/STO- 
2G level of theory. 

In this case, steepest descents can be extremely slow, 
while second order (Newton-Raphson) and approximate 
second order (quasi-Newton) methods achieve an accel- 
erated rate of convergence through conditioning of the 
stepsize. While methods that render the number of ge- 
ometry steps independent of system size do exist |30j . 
they apply (so far) only to homogeneous systems. In 
general though, the number of steps required to minimize 
to within a certain tolerance increases with system size, 
roughly as C(A at oms/3) according to Ref. Isil Therefore, 
the rate of linear convergence achieved by an optimiza- 
tion algorithm is perhaps a good measure of performance 
for large systems. 

An optimization algorithm may be characterized by 
the ratio AE i+1 /AEf = c, where AEi = \E, - E opt \ is 
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log 10 (E r E opt ) 

FIG. 5: A plot of log 10 A_Bi+i vs log 10 AEi, characterizing the 
ratio AE l+1 /AEi. The solid line is y = -0.04980+1.00111*2:. 
The slope of this line is equivalent to the order of convergence 
(n = 1.00111), while the y-intercept is the negative rate of 
convergence ( — / = log 10 (c) = —0.04980). 
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FIG. 6: Change in the rate of convergence / during opti- 
mization of the 263 atoms Protein Kinase A fragment. / is 
modeled by a 10th order polynomial. The polynomial coef- 
ficients have been obtained from a non-linear fit to the Ei-i 
data-pairs, using the model function Ei = £ pt + A * e~^*\ 



the absolute error at geometry step i, E opt is the extremal 
value of the energy, n is the order of convergence, and c 
is the convergence factor [3^. With first order conver- 
gence n = 1, and the error typically decays exponentially 
as AEi — e~f i * 1 , where / = — log(c) is the rate of con- 
vergence. In the basin of attraction, where the second 
order approximation to the nonlinear objective is accu- 
rate, the Newton-Raphson method achieves second or- 
der convergence, n — 2, while approximate second order 
methods (quasi-Newton algorithms for example) can at 
best achieve superlinear convergence 0, lll( . Superlinear 
convergence is characterized by n = 1 and an increasing 



rate of convergence /, equivalent to lim^^ a — 0. 

To investigate the convergence properties of QUICCA 
in a large application, we have carried out optimization of 
a 263 atom fragment of crystallographic Protein Kinase 
A, involving two Magnesiums, three crystallographic wa- 
ters, ATP and a fragment of the Protein Kinase Inhibitor 
to which phosphate is transfered. This is a model reac- 
tant system obtained from Ref . Issl and is shown in Fig. [21 
Optimization was performed at the HF/STO-2G level of 
theory with an approximate absolute energy resolution of 
10~ 5 hartree using MondoSCF j2^ at the tight accuracy 
level and with ten peripheral atoms constrained to mimic 
steric effects of the enzyme. Note that this minimal level 
of theory actually makes more work for the optimizer, as 
it will tend to move away from the more correct crys- 
tallographic structure. This structure was optimized to 
within a Cartesian force of 3 x 10~ 4 au and a maximum 
predicted displacement less than 4 x 10~ 3 au. The num- 
ber of internal coordinates, generated for this structure 
was 1280 at convergence. Figure |31 shows \og 10 AEi vs i 
for this system, Fig.0]the root mean square of the Carte- 
sian gradient at each step. 

In Fig. a plot of log 10 A_Ej + i vs log 10 Ai?i is shown, 
demonstrating only first order convergence with a rate 
/ ~ 0.05. To examine the rate of convergence in greater 
detail, we fit the the model function Ei — E opt +A*e~' l * : ' i , 
using an 11th order polynomial to represent /j. The fit- 
ted rate of convergence shown in Fig. suggests that 
the optimization has three different phases, in accordance 
with Fig. |21 In the first phase (steps 1-10) there is a rapid 
refinement of the least-optimal substructures. In the sec- 
ond phase (steps 10-60) the absolute energy decreases 
less rapidly to ~ 1 mhartree, with the main optimiza- 
tion effort corresponding to changes in the local internal 
coordinates. Finally, the third phase (60-120) occurs at 
a slower, constant rate of about / = 0.05, which is con- 
sistent with the fit characterizing the ratio AE^+i/ AEi 
shown in Fig. 

In the third phase, below 1 mhartree, backtracking be- 
comes necessary to achieve smooth convergence of the 
energy and the rate of convergence decreases to a con- 
stant. We believe that this deceleration is due to the 
presence of "floppy" low- frequency modes that are highly 
delocalized and not accounted for in the current local in- 
ternal coordinate system. While QUICCA can certainly 
be used with a more delocalized internal coordinate sys- 
tem, the construction of delocalized internal coordinates 
is an important open problem in molecular geometry op- 
timization. It is also important to note that the effect of 
floppy modes is not the only factor in the third phase of 
the optimization; the rigidity of the internal coordinate 
system and its role in the inaccurate convergence of the 
back-transformation is also important. In our opinion, 
these two factors, which characterize the last phase of 
the optimization may be related to each other. A recent 
work of Fernandez- Serra et.al |3J| calls attention to the 
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relationship of the rigidity of internal coordinate systems 
and the phenomenon of floppy modes. This relationship 
may be understood from the theory of rigidity [lEl . In 
the present work, we do not consider this relationship 
further. 



CONCLUSIONS 

We have presented a new concept for local optimiza- 
tion, the "QUasi Independent Curvilinear Coordinate 
Approximation" or QUICCA, and applied it to the ge- 
ometry optimization of molecular structures. Competi- 
tive with the most aggressive internal coordinate geom- 
etry optimizers in the literature, QUICCA has demon- 
strated that an efficient optimization algorithm can be 
constructed without explicit coupling in local expan- 
sion of the energy. However, coupling effects do enter 
QUICCA implicitly through the weighted curve fit, pro- 
viding an important averaging effect that compensates 
for irregularities related to the independent coordinate 
approximation. 

The implementation presented here is relatively sim- 
ple, employing redundant primitive internal coordinates, 
a weighted line fit and backtracking, allowing substantial 
room for improvement. Indeed, the QUICCA concept 
opens the door for a wide range of auxiliary techniques 
that may benefit from physical insight (such as the de- 
velopment of more precise weighting schemes) as well as 
from ideas in applied mathematics involving robust es- 
timation, interpolation and advanced methods of regres- 
sion. 

The primitive internal coordinate system employed in 
the present implementation does not include delocal- 
ized motions and suffers from slowdowns associated with 
floppy modes. The construction of dclocalized internal 
coordinate systems that remain approximately decoupled 
is thus an outstanding problem in molecular geometry 
optimization. Nevertheless, the preconditioning afforded 
by local stretchings, bendings and torsions is an impor- 
tant step, enabling the large scale application of linear 
scaling electronic structure theory to problems in biology 
and materials science. In addition, with the availability 
of fast linear scaling Cartesian/internal coordinate trans- 
forms 0], the QUICCA algorithm may also present ad- 
vantages in the optimization of large systems employing 
classical forcefields. 

The averaging scheme presented here, achieving an ef- 
fective decoupled internal coordinate representation of 
the potential energy surface, may also be useful in on-the- 
fly construction of classical force fields. In such an ap- 
proach, force field parameters such as the spring-constant 
and equilibrium position could be updated periodically 
from ab initio data, taken over several time steps. This 
on-the-fly construction of force fields based on QUICCA 
might provide a useful alternative to the force matching 



algorithm of Ercolessi et. al. 36] and to the reactive force 
field concepts developed by the Goddard group [13, • 
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